2025 Social Media Engagement Strategy: A Data-Driven Review of Key Trends and Insights

A Deep Dive into Social Media Engagement of 2024

Author

Lovely Fernandez | C20305696

Published

March 28, 2025

Introduction

In today’s digitally connected world, online social media platforms continue to dominate business’ preferences for reaching and engaging with their customers. As a result, influencers have emerged as a crucial brand figure in marketing strategies, playing a vital role in connecting with their followers – shaping consumer opinions, preferences, and purchasing decisions.

Influencers unique approach to authentically promote products, services and/or experiences had made them indispensable assets for brands seeking to expand their online presence, build brand awareness and accelerate business growth. The rise of influencer marketing opened opportunities for businesses to directly interact with their target audiences, offering a more personalised, engaging, and effective approach to traditional marketing methods.

Operating in a fast-paced, and highly competitive market, it is essential for businesses to stay informed about the latest trends, widely used platforms , and strategies for leveraging influencer partnerships to achieve their marketing goals.

This dashboard analyzes influencer activity, content performance, engagement, and view rates across platforms and regions — highlighting how these factors impact reach and revenue potential. The report aims to deliver valuable insights and data-driven analysis to help businesses make informed marketing decisions in 2025.

Data Preparation

Code
# Load and Clean Data

# load dataset and format the .csv files into one dataset
# path to dataset folder
data_path <- "Top 100 Influencers"

# list country folders from dataset folder
country_folders <- list.dirs(path = data_path, full.names = TRUE, recursive = FALSE)

# social media platforms to work with
platforms <- c("tiktok", "instagram", "youtube")

# column names
base_columns <- c("rank", "name", "followers", "engagement", "country", "topic", "reach")

# detect headers from entering as data
known_header_names <- tolower(c(
  "name", "اسم", "الاسم", "نام", "имя", "ИМЕ", "namn", "名", "名稱", "ime", "nome", "nombre"
))

# list of merged data
merged_data <- list()

# load each country folder and combine the same platforms
for (platform in platforms) {
  
  # merge files in the same platform
  platform_data <- map_dfr(country_folders, function(folder) {
    # same platform files (platform_data_country.csv)
    files <- list.files(folder, pattern = paste0(platform, "_data_.*\\.csv$"), full.names = TRUE)
    
    data_list <- lapply(files, function(file) {
      df <- read_csv(file, col_names = FALSE, show_col_types = FALSE)
      
      # keep the first 8 columns (ignore extra info)
      df <- df[, 1:min(ncol(df), length(base_columns))]
      
      # set column names
      colnames(df) <- base_columns[1:ncol(df)]
      
      # remove rows that are actually headers in various languages
      df <- df %>%
        filter(!tolower(name) %in% known_header_names)

      # metadata
      df$platform <- platform
      df$target_market <- basename(folder)
      
      return(df)
    })
    
    bind_rows(data_list)
  })
  
  # save to the merged_data
  merged_data[[platform]] <- platform_data
}

# separating platform data
tiktok_data <- merged_data[["tiktok"]]
instagram_data <- merged_data[["instagram"]]
youtube_data <- merged_data[["youtube"]]

# checking the data
# head(tiktok_data, 5)
# head(instagram_data, 5)
# head(youtube_data, 5)

# merge all 3 platforms into one dataset
# adding a new column "platform"
tiktok_data$platform <- "tiktok"
instagram_data$platform <- "instagram"
youtube_data$platform <- "youtube"

all_platforms_data <- bind_rows(tiktok_data, instagram_data, youtube_data)

#view(all_platforms_data)
#head(all_platforms_data, 5)
Code
# all_platforms_data

# create a copy of original data - social media dataset
sm_dataset <- all_platforms_data

# exploring and understanding the data
str(sm_dataset)
tibble [17,635 × 9] (S3: tbl_df/tbl/data.frame)
 $ rank         : chr [1:17635] "1" "2" "3" "4" ...
 $ name         : chr [1:17635] "Ziba Gulley 🥀 @xzayx89" "Bryan Kazaka @bryankazaka" "JasleenKaur Official @jasleenxbeauty" "Pary Gull Moti @parygullofficial" ...
 $ followers    : chr [1:17635] "14.5M" "9M" "4.1M" "1.4M" ...
 $ engagement   : chr [1:17635] "0.3%" "-" "-" "-" ...
 $ country      : chr [1:17635] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ topic        : chr [1:17635] "Business and Finance" "Entertainment and Music Finance Upskilling" "Beauty and Self Care Acting and Drama Celebrity" "Business and Finance" ...
 $ reach        : chr [1:17635] "4.4M" "2.7M" "1.2M" "420K" ...
 $ platform     : chr [1:17635] "tiktok" "tiktok" "tiktok" "tiktok" ...
 $ target_market: chr [1:17635] "afghanistan" "afghanistan" "afghanistan" "afghanistan" ...
Code
view(sm_dataset)
Code
# fixing data types
# convert numeric cols from char to numeric data types

convert_number <- function(x) {
  x <- toupper(x)
  x <- trimws(x)
  x <- ifelse(grepl("M", x), as.numeric(gsub("M", "", x)) * 1e6,
         ifelse(grepl("K", x), as.numeric(gsub("K", "", x)) * 1e3,
         as.numeric(x)))
  return(x)
}

# apply to followers and reach cols
sm_dataset$followers <- convert_number(sm_dataset$followers)
Warning in ifelse(grepl("M", x), as.numeric(gsub("M", "", x)) * 1e+06,
ifelse(grepl("K", : NAs introduced by coercion
Warning in ifelse(grepl("K", x), as.numeric(gsub("K", "", x)) * 1000,
as.numeric(x)): NAs introduced by coercion
Warning in ifelse(grepl("K", x), as.numeric(gsub("K", "", x)) * 1000,
as.numeric(x)): NAs introduced by coercion
Code
sm_dataset$reach <- convert_number(sm_dataset$reach)
Warning in ifelse(grepl("M", x), as.numeric(gsub("M", "", x)) * 1e+06,
ifelse(grepl("K", : NAs introduced by coercion
Warning in ifelse(grepl("M", x), as.numeric(gsub("M", "", x)) * 1e+06,
ifelse(grepl("K", : NAs introduced by coercion
Warning in ifelse(grepl("M", x), as.numeric(gsub("M", "", x)) * 1e+06,
ifelse(grepl("K", : NAs introduced by coercion
Code
# deleting rank column as it is not needed (ASC follower count)
# convert rank as integer and reset rank based on number of followers
# sm_dataset$rank <- as.integer(sm_dataset$rank)
sm_dataset <- sm_dataset %>% select(-rank)

# clean and convert engagement to decimal format
sm_dataset$engagement <- gsub("%", "", sm_dataset$engagement) # remove %
sm_dataset$engagement <- trimws(tolower(sm_dataset$engagement)) # trim spaces, lowercase (in case)

sm_dataset$engagement <- ifelse(
  sm_dataset$engagement %in% c("-", "", "n/a", "na", "null", "—"),
  NA,
  sm_dataset$engagement
)

sm_dataset$engagement <- as.numeric(sm_dataset$engagement)

sm_dataset$engagement <- ifelse( sm_dataset$engagement > 100, 95.00, sm_dataset$engagement ) # adding this cap due to 100+ values for %

str(sm_dataset)
tibble [17,635 × 8] (S3: tbl_df/tbl/data.frame)
 $ name         : chr [1:17635] "Ziba Gulley 🥀 @xzayx89" "Bryan Kazaka @bryankazaka" "JasleenKaur Official @jasleenxbeauty" "Pary Gull Moti @parygullofficial" ...
 $ followers    : num [1:17635] 14500000 9000000 4100000 1400000 1300000 ...
 $ engagement   : num [1:17635] 0.3 NA NA NA NA NA NA NA NA NA ...
 $ country      : chr [1:17635] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ topic        : chr [1:17635] "Business and Finance" "Entertainment and Music Finance Upskilling" "Beauty and Self Care Acting and Drama Celebrity" "Business and Finance" ...
 $ reach        : num [1:17635] 4400000 2700000 1200000 420000 390000 ...
 $ platform     : chr [1:17635] "tiktok" "tiktok" "tiktok" "tiktok" ...
 $ target_market: chr [1:17635] "afghanistan" "afghanistan" "afghanistan" "afghanistan" ...
Code
view(sm_dataset) # numeric columns are updated
Code
na_numeric_rows <- sm_dataset %>%
  filter(if_any(where(is.numeric), is.na))

print(na_numeric_rows)
# A tibble: 8,056 × 8
   name         followers engagement country topic  reach platform target_market
   <chr>            <dbl>      <dbl> <chr>   <chr>  <dbl> <chr>    <chr>        
 1 "Bryan Kaza…   9000000         NA Afghan… Ente… 2.70e6 tiktok   afghanistan  
 2 "JasleenKau…   4100000         NA Afghan… Beau… 1.20e6 tiktok   afghanistan  
 3 "Pary Gull …   1400000         NA Afghan… Busi… 4.20e5 tiktok   afghanistan  
 4 "Valentino …   1300000         NA Afghan… <NA>  3.90e5 tiktok   afghanistan  
 5 "QueenofTik…   1100000         NA Afghan… Fitn… 3.30e5 tiktok   afghanistan  
 6 "Mehral Sha…   1100000         NA Afghan… <NA>  3.30e5 tiktok   afghanistan  
 7 "haniyemaza…    946500         NA Afghan… <NA>  2.84e5 tiktok   afghanistan  
 8 "SlayerGirl…    928100         NA Afghan… Beau… 2.78e5 tiktok   afghanistan  
 9 "777zvrahfn…    860900         NA Afghan… Ente… 2.58e5 tiktok   afghanistan  
10 "Soma Zafar…    813800         NA Afghan… Trav… 2.44e5 tiktok   afghanistan  
# ℹ 8,046 more rows
Code
# handling NA values in engagement rate
# similar values are used to fill na engagement cells (knn imputation technique)
# found similar influencers by country, platform, topic and follower count for realistic engagement rates estimations
knn_temp_data <- sm_dataset %>% 
  select(engagement, followers, country, platform, topic)

knn_temp_data <- knn_temp_data %>%
  mutate(
    country = as.factor(country),
    platform = as.factor(platform),
    topic = as.factor(topic)
  )

knn_result <- kNN(knn_temp_data, variable = "engagement", k = 5, dist_var = c("followers", "country", "platform", "topic"))

sm_dataset$engagement <- knn_result$engagement # update values
Code
# columns: follower count, engagement rate and reach cleaned as well as rank is deleted
na_numeric_rows <- sm_dataset %>%
  filter(if_any(where(is.numeric), is.na))

print(na_numeric_rows) # no numeric NA values
# A tibble: 0 × 8
# ℹ 8 variables: name <chr>, followers <dbl>, engagement <dbl>, country <chr>,
#   topic <chr>, reach <dbl>, platform <chr>, target_market <chr>
Code
# cleaning remaining columns
# final decision to force delete any non numeric columns with NA due to high amount of data available

# check columns with NA values (may be important) and drop all NAs
# na_non_numeric_rows <- sm_dataset %>%
  # filter(if_any(where(~ !is.numeric(.)), is.na))

colSums(is.na(sm_dataset[, sapply(sm_dataset, Negate(is.numeric))]))
         name       country         topic      platform target_market 
            0           217          9471             0             0 
Code
sm_dataset <- sm_dataset %>%
  drop_na()

colSums(is.na(sm_dataset)) # no NAs values left
         name     followers    engagement       country         topic 
            0             0             0             0             0 
        reach      platform target_market 
            0             0             0 
Code
# lower case country names for easier comparison
sm_dataset <- sm_dataset %>%
  mutate(
    country = tolower(country),
    target_market = tolower(target_market)
  )

# check unqieu values to check data, and using Chatgpt to give me the English format/translation of non-English data

sm_dataset <- sm_dataset %>%
  mutate(country = case_when(
    country == "belgien" ~ "belgium",
    country == "индонезия" ~ "indonesia",
    country == "פרגוואי" ~ "paraguay",
    country == "برزیل" ~ "brazil",
    country == "رومانيا" ~ "romania",
    country == "ریاستہائے متحدہ" ~ "united states",
    country == "جنوبی کوریا" ~ "south korea",
    country == "بھارت" ~ "india",
    country == "کینیڈا" ~ "canada",
    country == "پیورٹو ریکو" ~ "puerto rico",
    country == "متحدہ عہدگی خصوصیں" ~ "united arab emirates",  # closest meaning
    country == "芬蘭" ~ "finland",
    country == "المغرب" ~ "morocco",
    country == "மாரக்கோ" ~ "morocco",
    country == "النرويج" ~ "norway",
    country == "slovaquie" ~ "slovakia",
    country == "reino unido" ~ "united kingdom",
    country == "संयुक्त अधिराज्य" ~ "united kingdom",
    country == "emirados árabes unidos" ~ "united arab emirates",
    country == "suurbritannia" ~ "united kingdom",
    country == "misri" ~ "egypt",
    country == "poljska" ~ "poland",
    country == "sverige" ~ "sweden",
    TRUE ~ country
  ))

# clean hyphens and lowercase
sm_dataset <- sm_dataset %>%
  mutate(country = tolower(str_replace_all(country, "-", " ")))

# standardise using countrycode
sm_dataset <- sm_dataset %>%
  mutate(target_market = countrycode(target_market, origin = "country.name", destination = "country.name"))

#unique(sm_dataset$country)
#unique(sm_dataset$target_market)
view(sm_dataset)
Code
# define topic categories
# used chatgpt to translate non-English Topics to appropriately assign the values to the right categories
topic_categories <- list(
  "Entertainment" = c(
    "funny", "lustig", "romance", "wedding", "romantik", "hochzeit", 
    "comedy", "humor", "забавно", "забавление", "zabava i glazba", 
    "smišno", "roligt", "zabava i glazba smišno", "životinje smišno",
    "meelelahutus", "entertainment", "quotes", "TV Shows", "Movies",
    "television", "رقص", "lõbus", "طنز", "بسیار خنده دار",
    "الترفيه والموسيقى", "الترفيه والموسيقى رومانسية وزفاف", 
    "الترفيه والموسيقى مضحك موسيقى",
    "الترفيه والموسيقى الأزياء والإکسسوارات", "celebrity", "مشہور شخصیت پالتو جانوروں"
  ),
  "Music" = c(
    "music", "musik", "singer", "band", "songwriter", "dj", "producer", 
    "гласба", "певец", "музика", "glazba", "rap", "música", "musique", 
    "muziki", "音樂", "இசை", "موسیقی", "musique rock", "laulja", "सङ्गीत",
    "موسيقى", "الترفيه والموسيقى التسويق والإعلان موسيقى", 
    "الترفيه والموسيقى الأزياء والإکسسوارات موسيقى", 
    "الترفيه والموسيقى موسيقى منتجون ترفيه", 
    "الترفيه والموسيقى مغني موسيقى تألیف الأغانی"
  ),
  
  "Sports" = c(
    "sport", "sports", "football", "soccer", "cricket", "nba", 
    "mlb", "баскетбол", "тенис", "спорт", "jalgpall", "ice hockey",
    "basketball", "رياضة", "ورزش فوتبال ورزشکار", "ورزش اتومبیل فوتبال",
    "ورزش فوتبال", "فوتبال", "ورزش تجارت و امور مالی فوتبال", "ورزش",
    "تمويل مشهور كرة قدم مشاهير"
  ),
  
  "Technology" = c(
    "technology", "tech", "gadgets", "ai", "machine learning", "Tech"
  ),
  
  "Travel and Vlog" = c(
    "life", "travel", "nature", "outdoor", "reisen", "adventure", 
    "journey", "приключение", "пътувания", "resor", "natur", 
    "priroda", "viagem", "यात्रा", "سفر", "elustiil", "زندگی و جامعه مدلسازی"
  ),
  
  "Family" = c(
    "family", "familie", "област", "moms", "parenting", "familj", 
    "pere", "família", "lapsevanemlus", "perhe", "famille", 
    "família comida estilo de vida", "أمهات", "أطفال", "الأسرة", "Crianças"
  ),
  
  "Cuisine" = c(
    "food", "essen", "trinken", "drink", "cuisine", "recipe", 
    "напитки", "храна", "hrana", "chakula", "chakula na vinywaji", 
    "toit", "طعام", "الطعام والشراب"
  ),
  
  "Games" = c(
    "game", "gaming", "video", "videospiel", "игри", "esports", 
    "datorspel", "video gaming", "video games", "jeux vidéo", 
    "ویڈیو گیمنگ", "電子遊戲", "भिडियो गेमिङ", "ألعاب فيديو", "jogos de vídeo"
  ),
  "Classic Entertainment" = c(
    "acting", "drama", "celebrity", "prominenter", "schauspiel", 
    "zabavlenie", "muzika", "gluma", "актьорство", "драма", "знаменитост", 
    "שחקנות", "actors", "event", "actors film", "celebrities", 
    "näitlejad", "näitlemine", "kuulsus", "प्रसिद्धि", "مشهور شخصیت", "celebridad",
    "kuulsus ajakirjanikud", "تمثيل", "تمثيل والدراما", "ممثلين",
    "بازیگری و درام مد و لوازم جانبی مشهور بازیگران", "بازیگری و درام زندگی بازیگران", 
    "بازیگری و درام مشهور تجسم", "مشهور بازیگران مشاهیر طراحی", "مد بازیگران هنر", 
    "مشهور بازیگران مشاهیر طنز", "سرگرمی مجری تلویزیونی برنامه های تلویزیونی",
    "TV Host", "Atores", "Actor", "Animacija i Cosplay", "بازیگران", "مشهور"
  ),
  
  "Creative Arts" = c(
    "art", "animation", "diy", "hack", "fotografie", "home", "garden",
    "kunst", "handwerk", "photography", "painting", "illustrator", 
    "architecture", "анимация", "косплей", "umjetnost", "umjetnost i zanati",
    "arts and crafts", "konst", "arts and crafts home and garden", 
    "fotografering", "fotografija", "arts and crafts entertainment and music",
    "umjetnost i zanati zabava i glazba", "umjetnost i zanati zabava i glazba obitelj", 
    "藝術", "فن", "هنر زیبایی و مراقبت از خود مد و لوازم جانبی هنرهای تجربی", 
    "هنر مشهور زندگی بازیگران", "هنر بازیگران", "هنر خواننده زندگی", "Design Interior Design"
  ),
  
  "Automotive" = c(
    "auto", "vehicle", "automotive", "car", "fahrzeuge", "превозни", 
    "мотори", "racing", "motorcycle", "السيارات", "المركبات",
    "اتومبیل و سایر خودروها ورزش مرور اینترنتی بازیگران", "gun"
  ),
  
  "Pets" = c(
    "animal", "pet", "haustier", "tier", "животни", "кучета", 
    "kućni ljubimci", "kućni ljubимци забавление", 
    "مشهور شخصیت پالتو جانوروں", "زیبایی و مراقبت از خود مشاهیر حیوانات و خانمانی",
    "بسیار خنده دار بازیگران مشاهیر حیوانات و خانمانی", "مد و لوازم جانبی طنز مشاهیر حیوانات و خانمانی"
  ),
  
  "Beauty and Health" = c(
    "beauty", "care", "fashion", "fitness", "health", "accessories", 
    "accessoires", "schönheit", "selbstpflege", "hairstyle", "makeup", 
    "styling", "مودة", "грижа", "ljepota i briga za sebe", 
    "mode und accessoire", "skönhet och personlig vård", "skönheit", 
    "fashion and accessories", "mode och accessoarer", "moda i pribor",
    "modeling", "moda e acessórios", "estilo", "jameel", "güzellik",
    "لباس", "جمال", "الجمال والعناية الذاتية", "Skönheit",
    "مد مدلسازی", "تناسب اندام و سلامتی مدلسازی", "زیبایی و مراقبت از خود استایلینگ هنرمند",  
    "مد و لوازم جانبی زیبایی و مراقبت از خود بازاریابی و تبلیغات زندگی",
    "مد و لوازم جانبی خانواده مدلسازی مشاهیر", "اللياقة البدنية والصحة", "مشہور شخصیت",  
    "الأزياء والإكسسوارات", "الأزياء والإکسسوارات نمط الحياة", "أزياء", "Modelagem Moda",
    "Мода и аксесоари", "健身", "Skönhet"
  ),
  
  "Brands and Collaborations" = c(
    "product", "showcase", "brand", "sponsorship", "collaboration", 
    "представяния", "shopping", "marketing and advertising actors", 
    "masoko na utangazaji", "blogueiro", "criadores", 
    "marketing", "advertising", "مارکیٹنگ", "اشتہار",
    "عرض المنتج", "promotions", "Predstavljanje proizvoda", 
    "مد و لوازم جانبی ویدئوبلاگ نویس"
  ),
  
  "Education" = c(
    "education", "bildung", "business", "finance", "vehicle", "auto", 
    "automotive", "news", "politik", "обучение", "образование", 
    "повишаване", "qualifikation", "новини", "upskilling", 
    "utbildning", "coaching", "marketing and advertising actors coaching", 
    "vitabu", "politics", "política", "politique", "समाचारहरू", 
    "tax", "آموزش", "تعليم", "تعليم الحياة والمجتمع", 
    "آموزش ارتقای مهارت ها مشهور مشاهیر", "Obrazovanje", "Obrazovanje Usavršavanje", 
    "أخبار", "Negócios e Finanças", "Negócios e Finanças Finanças Empreendedor Modelagem", 
    "Book", "صحافيين", "Journalists", 
    "سیاست", "الأعمال والتمویل التمويل الشخصي", "الأعمال والتمويل"
  )
)
Code
# categorising content topic into simplier categories due to high number of unique values (keywords above)

temp_dataset <- sm_dataset # make a safe copy
# unique(temp_dataset$topic)

# category assignment
assign_category <- function(topic, pattern_map) {
  for (category in names(pattern_map)) {
    patterns <- pattern_map[[category]]
    for (pattern in patterns) {
      if (grepl(pattern, topic, ignore.case = TRUE)) {
        return(category)
      }
    }
  }
  return(topic)  # original topic
}

# apply to temp dataset
clean_topics <- str_trim(temp_dataset$topic)
simplified_categories <- sapply(clean_topics, assign_category, pattern_map = topic_categories)

# assign new cateories to sm_dataset$topic
temp_dataset$topic <- simplified_categories
sm_dataset$topic <- temp_dataset$topic

# unique(temp_dataset$topic)
# unique(sm_dataset$topic)
# view(sm_dataset)

# sm_dataset
Code
# creating a new column - view rate
# this shows the 
sm_dataset$view_rate <- (sm_dataset$reach / sm_dataset$followers) * 100

sm_dataset$view_rate <- ifelse(
  is.na(sm_dataset$view_rate) | is.infinite(sm_dataset$view_rate),
  NA,
  sm_dataset$view_rate
)

sm_dataset$view_rate <- ifelse(sm_dataset$view_rate > 100, 95.00, sm_dataset$view_rate)

# sm_dataset
Code
# correctly formatting the dataset for better visual
smd_insights <- sm_dataset # copy finalised dataset

# rename columns
smd_insights <- smd_insights %>%
  rename(
    `Account` = name,
    `Follower Count` = followers,
    `Engagement %` = engagement,
    `Country Base` = country,
    `Content Topic` = topic,
    `Reached Audience` = reach,
    `Media Platform` = platform,
    `Target Market` = target_market,
    `View %` = view_rate
  )

# reorder columns
smd_insights <- smd_insights %>%
  select(
    `Account`,
    `Follower Count`,
    `Reached Audience`,
    `Engagement %`,
    `View %`,
    `Content Topic`,
    `Country Base`,
    `Target Market`,
    `Media Platform`
  )

# updating country and platform values to upper case
smd_insights <- smd_insights %>%
  mutate(
    `Country Base` = str_to_title(`Country Base`),
    `Target Market` = str_to_title(`Target Market`),
    `Media Platform` = str_to_title(`Media Platform`)
  )

# standardising country names in smd_insights to match world map data
smd_insights <- smd_insights %>%
  mutate(
    `Target Market` = case_when(
      `Target Market` == "United Kingdom" ~ "UK",
      `Target Market` == "United States" ~ "USA",
      `Target Market` == "Hong Kong Sar China" ~ "China",
      `Target Market` == "Czechia" ~ "Czech Republic",
      TRUE ~ `Target Market`  # Keep other countries as they are
    )
  )

# platform colours, including 'Unknown' values
platform_colors <- c(
  "Instagram" = "#D02A7B",  
  "Tiktok" = "#69c9D0",     
  "Youtube" = "#FF0400",  
  "Unknown" = "#708090"        
)


head(smd_insights, 5)
# A tibble: 5 × 9
  Account            `Follower Count` `Reached Audience` `Engagement %` `View %`
  <chr>                         <dbl>              <dbl>          <dbl>    <dbl>
1 "Ziba Gulley \U00…         14500000            4400000            0.3     30.3
2 "Bryan Kazaka @br…          9000000            2700000            0.2     30  
3 "JasleenKaur Offi…          4100000            1200000            0.2     29.3
4 "Pary Gull Moti @…          1400000             420000            0.1     30  
5 "QueenofTikTok\U0…          1100000             330000            0.2     30  
# ℹ 4 more variables: `Content Topic` <chr>, `Country Base` <chr>,
#   `Target Market` <chr>, `Media Platform` <chr>

Exploratory Data Analysis

Code
# check datset
# smd_inisghts

# reformat follower count for easy understanding
format_followers <- function(x) {
  if (x >= 1e6) {
    paste0(round(x / 1e6, 1), "M")
  } else if (x >= 1e3) {
    paste0(round(x / 1e3, 1), "K")
  } else {
    as.character(x)
  }
}

Leading Social Media Influencers by Country

Code
# 1. Map Chart

# identify top accounts per market
top_accounts_per_market <- smd_insights %>%
  group_by(`Target Market`) %>%
  slice_max(order_by = `Follower Count`, n = 1, with_ties = FALSE) %>%
  ungroup()

# world map data
world <- map_data("world")

# merging world map with top accounts data
world_data <- left_join(world, top_accounts_per_market, by = c("region" = "Target Market"))
# replace only missing platforms with "Unknown"
world_data[is.na(world_data)] <- "Unknown"

# create map
map <- ggplot(world_data) +
  geom_polygon(
    aes(
      x = long, y = lat, group = group, fill = `Media Platform`,
      text = paste0(
        "<b>Account:</b> ", `Account`, "<br>",
        "<b>Followers:</b> ", `Follower Count`, "<br>",
        "<b>Content:</b> ", `Content Topic`, "<br>",
        "<b>Target Market:</b> ", region, "<br>",
        "<b>Platform:</b> ", `Media Platform`
      )
    ),
    color = "black", size = 0.1
  ) +
  scale_fill_manual(
    values = platform_colors,
    drop = FALSE
  ) +
  theme_void() +
  labs(
    title = "Leading Social Media Influencers by Country",
    fill = "Platform"
  ) +
  theme(
    plot.title = element_text(face = "bold")
  )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning in geom_polygon(aes(x = long, y = lat, group = group, fill = `Media
Platform`, : Ignoring unknown aesthetics: text
Code
interactive_map <- ggplotly(map, tooltip = "text")
interactive_map

The map visualisation provides a global view of the social media landscape, highlighting the dominance of TikTok, Instagram, and YouTube based on the top influencers in each country. Each country represents the top influencer of that region, with the area color-coded according to the platform on which the influencer is most active. This provides a clear indication of which platform dominates influencer activity on a per-country basis.

In this example, it appears that Instagram is the dominant platform globally, with the majority of countries showing influencers primarily active on Instagram. YouTube, however, stands out in only three countries: Finland, Saudi Arabia, and Finland. This may be due to regional differences in platform access or the content preferences of these countries. For instance, in Finland, the preference for gaming content may explain YouTube’s strong position in the region.

As for TikTok, it dominates in countries such as Afghanistan, Italy, Romania, Thailand, Ecuador, Peru, and Bolivia. This suggests that TikTok is more popular in these regions, potentially due to its short-form video format and its appeal to younger audiences.

The map also highlights global trends in influencer marketing, allowing businesses to identify which platforms are most relevant to their target audiences in each region.

Top Influencers per Platform

Code
# top 30 yt influencers
yt_filtered <- smd_insights %>%
  filter(`Media Platform` == "Youtube") %>%
  group_by(`Country Base`) %>%
  slice_max(order_by = `Follower Count`, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  arrange(desc(`Follower Count`)) %>%
  slice_head(n = 30)

yt_filtered$Rank <- 1:nrow(yt_filtered)
yt_filtered$Account <- make.unique(as.character(yt_filtered$Account))
yt_filtered$Rank <- factor(yt_filtered$Rank, levels = rev(yt_filtered$Rank))

yt_filtered$`Followers Label` <- sapply(yt_filtered$`Follower Count`, format_followers)
yt_filtered$ShortName <- sub("[^a-zA-Z0-9]+.*$", "", yt_filtered$Account)

# create chart
yt_barchart <- ggplot(yt_filtered, aes(
  x = Rank,
  y = `Follower Count`,
  text = paste0(
    "<b>Rank:</b> ", Rank, "<br>",
    "<b>Account:</b> ", Account, "<br>",
    "<b>Followers:</b> ", `Followers Label`, "<br>",
    "<b>Content:</b> ", `Content Topic`, "<br>",
    "<b>Market:</b> ", `Target Market`
  )
)) +
  coord_flip() +
  geom_bar(stat = "identity", fill = "#800000", width = 0.8) +
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    expand = expansion(mult = c(0, 0.15))  # space on right
  ) +
  labs(
    title = "Top 30 YouTube Influencers",
    x = "Rank",
    y = "Number of Followers"
  ) +
  theme_minimal() +
  theme(
    plot.margin = margin(5.5, 100, 5.5, 5.5),  
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    axis.text.y = element_text(size = 7),
    axis.text.x = element_text(size = 10),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

yt_top_influencers_barchart_interactive <- ggplotly(yt_barchart, tooltip = "text")

# add name labels to the RIGHT of the bar
yt_top_influencers_barchart_interactive <- yt_top_influencers_barchart_interactive %>%
  layout(annotations = lapply(1:nrow(yt_filtered), function(i) {
    list(
      x = yt_filtered$`Follower Count`[i] + max(yt_filtered$`Follower Count`) * 0.01,
      y = nrow(yt_filtered) - (i-1),  
      text = yt_filtered$ShortName[i],
      xref = "x",
      yref = "y",
      showarrow = FALSE,
      font = list(size = 10, color = "black"),
      xanchor = "left",
      align = "left"
    )
  }))


yt_top_influencers_barchart_interactive
Code
# top 30 instagram influencers
ig_filtered <- smd_insights %>%
  filter(`Media Platform` == "Instagram") %>%
  group_by(`Country Base`) %>%
  slice_max(order_by = `Follower Count`, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  arrange(desc(`Follower Count`)) %>%
  slice_head(n = 30)

ig_filtered$Rank <- 1:nrow(ig_filtered)
ig_filtered$Account <- make.unique(as.character(ig_filtered$Account))
ig_filtered$Rank <- factor(ig_filtered$Rank, levels = rev(ig_filtered$Rank))
ig_filtered$`Followers Label` <- sapply(ig_filtered$`Follower Count`, format_followers)
ig_filtered$ShortName <- sub("[^a-zA-Z0-9]+.*$", "", ig_filtered$Account)

ig_barchart <- ggplot(ig_filtered, aes(
  x = Rank,
  y = `Follower Count`,
  text = paste0(
    "<b>Rank:</b> ", Rank, "<br>",
    "<b>Account:</b> ", Account, "<br>",
    "<b>Followers:</b> ", `Followers Label`, "<br>",
    "<b>Content:</b> ", `Content Topic`, "<br>",
    "<b>Market:</b> ", `Target Market`
  )
)) +
  coord_flip() +
  geom_bar(stat = "identity", fill = "#C13584", width = 0.8) +  # Instagram pink
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    expand = expansion(mult = c(0, 0.15))
  ) +
  labs(
    title = "Top 30 Instagram Influencers",
    x = "Rank",
    y = "Number of Followers"
  ) +
  theme_minimal() +
  theme(
    plot.margin = margin(5.5, 100, 5.5, 5.5),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    axis.text.y = element_text(size = 7),
    axis.text.x = element_text(size = 10),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

ig_top_influencers_barchart_interactive <- ggplotly(ig_barchart, tooltip = "text")

ig_top_influencers_barchart_interactive <- ig_top_influencers_barchart_interactive %>%
  layout(annotations = lapply(1:nrow(ig_filtered), function(i) {
    list(
      x = ig_filtered$`Follower Count`[i] + max(ig_filtered$`Follower Count`) * 0.01,
      y = nrow(ig_filtered) - (i - 1),
      text = ig_filtered$ShortName[i],
      xref = "x",
      yref = "y",
      showarrow = FALSE,
      font = list(size = 10, color = "black"),
      xanchor = "left",
      align = "left"
    )
  }))

ig_top_influencers_barchart_interactive
Code
# top 30 tiktok influencers
tt_filtered <- smd_insights %>%
  filter(`Media Platform` == "Tiktok") %>%
  group_by(`Country Base`) %>%
  slice_max(order_by = `Follower Count`, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  arrange(desc(`Follower Count`)) %>%
  slice_head(n = 30)

tt_filtered$Rank <- 1:nrow(tt_filtered)
tt_filtered$Account <- make.unique(as.character(tt_filtered$Account))
tt_filtered$Rank <- factor(tt_filtered$Rank, levels = rev(tt_filtered$Rank))
tt_filtered$`Followers Label` <- sapply(tt_filtered$`Follower Count`, format_followers)
tt_filtered$ShortName <- sub("[^a-zA-Z0-9]+.*$", "", tt_filtered$Account)

tt_barchart <- ggplot(tt_filtered, aes(
  x = Rank,
  y = `Follower Count`,
  text = paste0(
    "<b>Rank:</b> ", Rank, "<br>",
    "<b>Account:</b> ", Account, "<br>",
    "<b>Followers:</b> ", `Followers Label`, "<br>",
    "<b>Content:</b> ", `Content Topic`, "<br>",
    "<b>Market:</b> ", `Target Market`
  )
)) +
  coord_flip() +
  geom_bar(stat = "identity", fill = "#69c9D0", width = 0.8) +  # TikTok black
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    expand = expansion(mult = c(0, 0.15))
  ) +
  labs(
    title = "Top 30 TikTok Influencers",
    x = "Rank",
    y = "Number of Followers"
  ) +
  theme_minimal() +
  theme(
    plot.margin = margin(5.5, 100, 5.5, 5.5),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    axis.text.y = element_text(size = 7),
    axis.text.x = element_text(size = 10),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

tt_top_influencers_barchart_interactive <- ggplotly(tt_barchart, tooltip = "text")

tt_top_influencers_barchart_interactive <- tt_top_influencers_barchart_interactive %>%
  layout(annotations = lapply(1:nrow(tt_filtered), function(i) {
    list(
      x = tt_filtered$`Follower Count`[i] + max(tt_filtered$`Follower Count`) * 0.01,
      y = nrow(tt_filtered) - (i - 1),
      text = tt_filtered$ShortName[i],
      xref = "x",
      yref = "y",
      showarrow = FALSE,
      font = list(size = 10, color = "black"),
      xanchor = "left",
      align = "left"
    )
  }))

tt_top_influencers_barchart_interactive

Each chart offers insights into the top-performing influencers on these platforms, with detailed information on their content and audience. By analysing the top influencers on these platforms, brands can better understand who is driving the most engagement in each country and use this information to inform their advertising strategy.

By identifying who the trending influencers are on each platform, brands can align their advertising campaigns with the right content creators who already resonate with their target audience. If a brand is focused on sports, partnering with Cristiano Ronaldo on Instagram would be highly effective. If they are targeting a younger, entertainment-focused audience, working with MrBeast or Khaby Lame would be more appropriate.

This data-driven approach ensures that brands are not just targeting influencers, but the right influencers, whose platforms and content align with the brand’s messaging and audience preferences, maximising the effectiveness of their marketing.

CPV Comparison Across Platforms

Code
# 5. CPV Line Chart
# apply CPV estimates and calculate earnings (cpv searched online)
cpv_chart_data <- smd_insights %>%
  filter(!is.na(`Reached Audience`), !is.na(`Media Platform`)) %>%
  mutate(CPV = case_when(
    `Media Platform` == "Youtube" ~ 20,
    `Media Platform` == "Instagram" ~ 5.5,
    `Media Platform` == "Tiktok" ~ 0.03,
    TRUE ~ NA_real_
  )) %>%
  filter(!is.na(CPV)) %>%
  mutate(
    Estimated_Revenue = (`Reached Audience` / 1000) * CPV
  )

# top earning account per platform + reach
revenue_by_reach <- cpv_chart_data %>%
  group_by(`Media Platform`, `Reached Audience`) %>%
  slice_max(order_by = Estimated_Revenue, n = 1) %>%
  ungroup() %>%
  filter(!is.na(`Reached Audience`), !is.na(Estimated_Revenue), Estimated_Revenue > 0)

# format numbers nicely
format_number_short <- function(x) {
  ifelse(is.na(x), NA,
    ifelse(x >= 1e9, paste0(round(x / 1e9, 1), "B"),
    ifelse(x >= 1e6, paste0(round(x / 1e6, 1), "M"),
    ifelse(x >= 1e3, paste0(round(x / 1e3, 1), "K"),
           round(x, 0)))))
}

# format hover labels
revenue_by_reach <- revenue_by_reach %>%
  mutate(
    Reach_Label = format_number_short(`Reached Audience`),
    Revenue_Label = paste0("$", format_number_short(Estimated_Revenue)),
    Hover_Label = paste0(
      "<b>Platform:</b> ", `Media Platform`, "<br>",
      "<b>Account:</b> ", Account, "<br>",
      "<b>Reach:</b> ", Reach_Label, "<br>",
      "<b>Revenue:</b> ", Revenue_Label
    )
  )

# total revenue per platform
total_rev <- cpv_chart_data %>%
  group_by(`Media Platform`) %>%
  summarise(Total_Revenue = sum(Estimated_Revenue, na.rm = TRUE))

# create line chart
revenue_chart <- ggplot(revenue_by_reach, aes(
  x = `Reached Audience`,
  y = Estimated_Revenue,
  color = `Media Platform`,
  group = `Media Platform`,
  text = Hover_Label
)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  scale_color_manual(values = c(
    "Youtube" = "#FF0000",
    "Tiktok" = "#69c9D0",
    "Instagram" = "#D02A7B"
  )) +
  scale_x_continuous(labels = label_number(scale_cut = cut_short_scale())) +
  scale_y_continuous(labels = dollar_format(scale_cut = cut_short_scale())) +
  labs(
    title = "CPV Comparison Across Platforms",
    x = "Reached Audience",
    y = "Estimated Revenue ($ USD)",
    color = "Platform"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    axis.text.y = element_text(size = 7),
    axis.text.x = element_text(size = 10),
    legend.position = "bottom"
  )

# HTML-style summary label (searched chatgpt how to do)
summary_text <- total_rev %>%
  mutate(
    Pretty_Platform = case_when(
      `Media Platform` == "Youtube" ~ "<span style='color:#FF0000; font-weight:bold;'>YouTube</span>",
      `Media Platform` == "Instagram" ~ "<span style='color:#D02A7B; font-weight:bold;'>Instagram</span>",
      `Media Platform` == "Tiktok" ~ "<span style='color:#69c9D0; font-weight:bold;'>TikTok</span>"
    ),
    Label = paste0(Pretty_Platform, ": $", format_number_short(Total_Revenue))
  ) %>%
  pull(Label) %>%
  paste(collapse = "<br>")

# interactive plot with floating annotation
interactive_revenue_chart <- ggplotly(revenue_chart, tooltip = "text") %>%
  layout(
    annotations = list(
      list(
        x = 0.9,
        y = 0.2,
        xref = "paper",
        yref = "paper",
        showarrow = FALSE,
        align = "right",
        text = summary_text,
        font = list(size = 10),
        xanchor = "right",
        yanchor = "top"
      )
    )
  )

interactive_revenue_chart

The graph above presents a Cost Per View (CPV) comparison across Instagram, TikTok, and YouTube, showing the estimated revenue generated relative to the reached audience for each platform. Despite differences in content usage and platform popularity, the CPV comparison reveals interesting insights into how each platform monetises its audience.

Despite having a smaller audience in terms of influencer content creation, YouTube shows a significantly higher CPV, with an estimated revenue of $21M for a much smaller audience reach. This indicates that YouTube’s CPV is one of the highest in the market, ranging from $15 to $10 per view. The platform’s established monetisation model, which has been refined over the years, provides higher returns per view, especially for long-form content.

Instagram, as the middle ground between the two, shows a more balanced CPV at around $5 to $6 per view, reaching a broader audience of $75.5M. This is consistent with Instagram’s role as a leading social media platform with a variety of content types, from stories and posts to sponsored ads and IGTV. Instagram’s established presence and high engagement rates give it a healthy CPV return, making it a strong contender for brands that want to reach both large and niche audiences.

TikTok, with the lowest CPV, generates approximately $74K from its audience reach but with much lower revenue per view. The CPV on TikTok is between $0.02 to $0.05 per view, which is reflective of the platform’s growing status and its focus on short-form, viral content. While TikTok is still in its growth phase compared to YouTube and Instagram, its potential for explosive growth, especially among younger demographics, presents significant opportunities for brands targeting Gen Z and millennial audiences. This displays TikTok’s ability to reach massive audiences at lower costs, which makes it an attractive option for businesses looking to run high-volume campaigns at a lower cost per view. Brands looking to optimise their advertising strategy can leverage these CPV insights to tailor their campaigns.

Big Idea

In a highly competitive digital landscape, understanding what content and which regions drive the best engagement and view rates is crucial for developing a targeted, data-driven marketing strategy. The key to maximising a brand’s impact lies in identifying which types of content resonate most with audiences across different platforms, as well as understanding the regional nuances that drive engagement.

Content Types by Engagement and View

Code
# 1. Dominant Content by View and Engagement Rate  (Shows which content reached most users)

# summarise per content type
content_summary <- smd_insights %>%
  filter(!is.na(`Content Topic`), !is.na(`View %`), !is.na(`Engagement %`), !is.na(`Reached Audience`)) %>%
  group_by(`Content Topic`) %>%
  summarise(
    Est_View_Count = sum(`Reached Audience` * (`View %` / 100), na.rm = TRUE),
    Est_Engage_Count = sum(`Reached Audience` * (`Engagement %` / 100), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(Content = reorder(`Content Topic`, Est_Engage_Count + Est_View_Count))

stacked_data <- content_summary %>%
  select(Content, Est_Engage_Count, Est_View_Count) %>%
  pivot_longer(cols = c(Est_Engage_Count, Est_View_Count), names_to = "Metric", values_to = "Value") %>%
  mutate(Metric = recode(Metric, 
                         Est_Engage_Count = "Engagement Rate",
                         Est_View_Count = "View Rate"))
# plot with clean legend
stacked_bar_chart <- ggplot(stacked_data, aes(
  x = Content,
  y = Value,
  fill = Metric,
  text = paste0(
    "<b>Content:</b> ", Content, "<br>",
    "<b>Metric:</b> ", Metric, "<br>",
    "<b>Value:</b> ", format(round(Value), big.mark = ",")
  )
)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(
    values = c("Engagement Rate" = "#FF6347", "View Rate" = "#008080")
  ) +
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale())) +
  labs(
    title = "Content Types by Engagement and View",
    x = NULL,
    y = "Estimated Audience Count",
    fill = NULL
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.text.y = element_text(size = 9)
  )

interactive_stacked_chart <- ggplotly(stacked_bar_chart, tooltip = "text")
interactive_stacked_chart

The chart above visualises the engagement rate and view rate for different content types, ordered by estimated audience count. The entertainment category stands out as the dominant content type, showing the highest engagement relative to its audience size. This suggests that entertainment content not only reaches large audiences but also generates significant interactions, making it an ideal choice for brands aiming to engage users at a deeper level.

Sports and Music follow closely behind, with Sports content demonstrating a balanced performance in both engagement and view metrics. While Music has a higher view rate, its engagement rate is slightly lower. This indicates that while music content reaches a large audience, its interaction rate may not be as high as entertainment or sports content, though still valuable for brands targeting these specific groups.

Other content types such as Beauty and Health, Automotive, and Travel and Vlog show a more balanced distribution between view and engagement rates. While their audience reach may not be as large as entertainment or sports, they still offer opportunities for brands to target niche markets, particularly in the beauty, health, and automotive industries. Despite lower engagement rates, these categories represent an untapped potential for brands seeking to engage specific audiences.

America

Code
continent_charts$Asia
Code
continent_charts$Europe
Code
continent_charts$Africa
Code
continent_charts$Oceania

The scatter plots above reveal how content types perform across different regions in terms of engagement rate and view rate, offering key insights into regional content preferences.

In Asia, Entertainment content stands out with a high engagement rate, though it does not necessarily have the highest view rate. This suggests that while entertainment-related content draws significant interactions, it may not always reach as many viewers compared to other content categories. Music also shows a considerable level of engagement, indicating strong interest in wellness and lifestyle topics. Sports follow, performing with moderate engagement and view rates. This region shows a blend of popular, globally recognised content types like entertainment and music, with more niche categories such as sports.

In Europe, Entertainment content dominates, showing both a solid engagement rate and view rate, meaning it appeals to large audiences and generates significant interaction. Travel and Vlog content stands out with a balanced performance in both metrics, indicating a growing interest in travel-based content across the region.

North America sees Entertainment as the most engaging content type, with a notable engagement rate and a substantial view rate, indicating that entertainment content is not only widespread but also generates significant audience interaction. Beauty and Health content performs strongly, especially in terms of engagement, reflecting ongoing trends in the wellness and beauty industry. Sports, Music, and Technology also show considerable engagement, making North America an important region for brands in sports and tech-related content.

In South America, Entertainment content once again dominates in engagement, but Sports and Travel and Vlog content perform particularly well in both view rate and engagement. This suggests that audiences in South America are highly engaged with both sports-related and travel-based content.

Oceania displays a trend similar to other regions, with Entertainment content having the highest engagement rate. However, the view rate for entertainment is relatively lower, showing that audiences are highly engaged but may not be as many audiences.

In Africa, similarly to Oceania, Entertainment content continues to lead with the highest engagement rate, though it has a more moderate view rate compared to other regions. This indicates that while entertainment content performs well, it may not reach as many viewers as in Europe or Asia either.

Overall, the scatter plots indicate that Entertainment consistently drives high engagement across all regions, but Sports, Music, Beauty and Health, and Travel and Vlog show varying levels of success depending on the region. Asia and Europe have strong performances in Music, while South America and Africa show a growing interest in sports and travel content.

Brands should tailor their strategies based on these regional insights, focusing on entertainment and sports for broader appeal and targeting niche categories like beauty, music, or travel for more specific demographic groups.

Engagement Rate Across Markets and Contents Types

Code
# 2. Engagement Rate Across Markets and Content Types
# engagement rate by content type and continent
engagement_by_content_continent <- smd_insights %>%
  filter(!is.na(`Content Topic`), !is.na(`Engagement %`), !is.na(Continent)) %>%
  group_by(`Content Topic`, Continent) %>%
  summarise(
    Avg_Engagement_Rate = mean(`Engagement %`, na.rm = TRUE),
    .groups = "drop"
  )

# add custom hover label
engagement_by_content_continent <- engagement_by_content_continent %>%
  mutate(hover_text = paste0(
    "<b>Continent:</b> ", Continent, "<br>",
    "<b>Content:</b> ", `Content Topic`, "<br>",
    "<b>Average ER:</b> ", round(Avg_Engagement_Rate, 2), "%"
  ))

# create heatmap with red palette
heatmap_engagement <- ggplot(engagement_by_content_continent, aes(
  x = Continent,
  y = `Content Topic`,
  fill = Avg_Engagement_Rate,
  text = hover_text
)) +
  geom_tile(color = "white") +
  scale_fill_gradient(
    low = "#FFE5E0",
    high = "#B22222"
  ) +
  labs(
    title = "Content Effectiveness Across Continents",
    fill = "Engagement Rate (%)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_blank(),
    axis.text.x = element_text(hjust = 1, size = 8),
    axis.text.y = element_text(size = 7)
  )

interactive_heatmap <- ggplotly(heatmap_engagement, tooltip = "text", width = 1000) %>%
  layout(
    margin = list(l = 100, r = 40, b = 100, t = 80)
  )

interactive_heatmap

The heatmap above provides a clear visualisation of content effectiveness across continents, with a focus on engagement rate for different content types. It offers valuable insights into which types of content are driving higher engagement in each region, allowing brands to tailor their marketing strategies based on regional preferences.

The heatmap clearly demonstrates that regional content preferences differ significantly across continents. In North America, content related to sports, brands and collaborations, and creative arts shows higher engagement, making it a key region for sports-related and brand-centric campaigns. On the other hand, Asia shows strong performance for technology and beauty and health content, while South America shines for cuisine content. This information is invaluable for brands to strategically plan their influencer marketing campaigns based on regional content preferences and audience engagement.

Platform Engagement Performance by Continent

Code
# 3. Platform Performance Across Continents (Engagement Rate)

# aggregate engagement rate
platform_engagement <- smd_insights %>%
  filter(!is.na(`Media Platform`), !is.na(Continent), !is.na(`Engagement %`)) %>%
  group_by(Continent, `Media Platform`) %>%
  summarise(
    Avg_Engagement_Rate = mean(`Engagement %`, na.rm = TRUE),
    .groups = "drop"
  )

# order continents by total engagement
continent_order <- platform_engagement %>%
  group_by(Continent) %>%
  summarise(total = sum(Avg_Engagement_Rate)) %>%
  arrange(desc(total)) %>%
  pull(Continent)

platform_engagement$Continent <- factor(platform_engagement$Continent, levels = continent_order)

# create stacked bar
platform_stacked_bar <- ggplot(platform_engagement, aes(
  x = Avg_Engagement_Rate,
  y = Continent,
  fill = `Media Platform`,
  text = paste0(
    "<b>Continent:</b> ", Continent, "<br>",
    "<b>Platform:</b> ", `Media Platform`, "<br>",
    "<b>Avg Engagement Rate:</b> ", round(Avg_Engagement_Rate, 2), "%"
  )
)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(
    values = c(
      "Youtube" = "#e41a1c",
      "Tiktok" = "#69c9D0",
      "Instagram" = "#D02A7B"
    )
  ) +
  labs(
    title = "Platform Engagement Performance by Continent",
    x = "Avg Engagement Rate (%)",
    y = "Continent",
    fill = "Platform"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "bottom",
    axis.title.x = element_text(size = 9),
    axis.title.y = element_blank(),    
    axis.text.y = element_text(size = 8),
    axis.text.x = element_text(size = 9)
  )

interactive_platform_chart <- ggplotly(platform_stacked_bar, tooltip = "text")
interactive_platform_chart

The chart above shows platform engagement performance by continent, highlighting the average engagement rates of Instagram, TikTok, and YouTube across different regions. Instagram leads in engagement across most continents, with particularly high performance in North America, South America, and Europe, suggesting it remains the dominant platform for audience interaction. TikTok, while growing rapidly, shows strong performance in North and South America and Asia, especially among younger audiences, but still lags behind Instagram in overall engagement. YouTube, on the other hand, holds strong in regions like Africa and Asia, though it generally shows lower engagement rates compared to Instagram and TikTok, especially in regions where short-form content is more popular.

In regions like North America and Oceania, Instagram is the clear leader, while TikTok is catching up with increasing engagement, especially in younger demographics. YouTube shows the highest engagement in Africa, but generally has lower engagement rates in other regions compared to Instagram and TikTok. This pattern reflects the ongoing shift towards more visually engaging and shorter content, making Instagram and TikTok more appealing for interactive campaigns. These insights provide brands with valuable information on which platform to prioritize based on regional preferences and engagement trends.

Conclusion

In conclusion, the analysis of content types, regional preferences, and platform performance across continents highlights the importance of a data-driven approach to marketing. Entertainment consistently proves to be the most engaging content type across all regions, making it a powerful tool for brands aiming to reach large audiences and drive significant interaction. However, regions like South America and Africa show a rising interest in sports and travel content, while Asia and Europe demonstrate a strong affinity for music and technology. By understanding these trends, brands can strategically tailor their campaigns to align with local content preferences, enhancing engagement and view rates.

In terms of platform performance, Instagram remains the leader in engagement across most continents, followed by TikTok, which is growing rapidly, particularly among younger demographics in North and South America and Asia. YouTube, while showing higher engagement in regions like Africa and Asia, lags behind in areas dominated by short-form content. This underscores the shift towards more visually engaging and short-form content, which Instagram and TikTok capitalise on.

For brands, understanding regional content preferences, as well as platform-specific engagement trends, offers a competitive edge in designing targeted marketing strategies. By focusing on entertainment and sports for broader reach, and leveraging niche content types like beauty, music, or travel for specific audiences, brands can ensure they are reaching the right people in the most effective way. In this ever-evolving digital landscape, these insights are essential for maximising return on investment and staying ahead in the game.